AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Document Image Analysis

# Document Image Analysis

Qwen2.5 VL 7B Instruct Quantized.w4a16
Apache-2.0
Quantized version of Qwen2.5-VL-7B-Instruct, supporting vision-text input and text output, with weights quantized to INT4 and activations to FP16.
Text-to-Image Transformers English
Q
RedHatAI
605
3
Paligemma2 3b Ft Docci 448
PaliGemma 2 is an upgraded vision-language model released by Google, combining the capabilities of Gemma 2 and SigLIP vision models, supporting multilingual vision-language tasks.
Image-to-Text Transformers
P
google
8,765
12
Sd3 Long Captioner V2
Apache-2.0
A fine-tuned image-to-text generation model based on PaliGemma 224x224 version, specializing in generating detailed descriptions for artistic images
Image-to-Text Transformers Supports Multiple Languages
S
gokaygokay
135
25
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase